AITopics | best hyperparameter

Collaborating Authors

best hyperparameter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data-Driven Temperature Modelling of Machine Tools by Neural Networks: A Benchmark

Coelho, C., Hohmann, M., Fernández, D., Penter, L., Ihlenfeldt, S., Niggemann, O.

arXiv.org Artificial IntelligenceOct-7-2025

Traditional thermal error correction/compensation methods rely on measured temperature-deformation fields or on transfer functions. Most existing data-driven compensation strategies employ neural networks (NNs) to directly predict thermal errors or specific compensation values. While effective, these approaches are tightly bound to particular error types, spatial locations, or machine configurations, limiting their generality and adaptability. In this work, we introduce a novel paradigm in which NNs are trained to predict high-fidelity temperature and heat flux fields within the machine tool. The proposed framework enables subsequent computation and correction of a wide range of error types using modular, swappable downstream components. The NN is trained using data obtained with the finite element method under varying initial conditions and incorporates a correlation-based selection strategy that identifies the most informative measurement points, minimising hardware requirements during inference. We further benchmark state-of-the-art time-series NN architectures, namely Recurrent NN, Gated Recurrent Unit, Long-Short Term Memory (LSTM), Bidirectional LSTM, Transformer, and Temporal Convolutional Network, by training both specialised models, tailored for specific initial conditions, and general models, capable of extrapolating to unseen scenarios. The results show accurate and low-cost prediction of temperature and heat flux fields, laying the basis for enabling flexible and generalisable thermal error correction in machine tool environments.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2510.03261

Country:

North America > United States (0.28)
Europe > Germany (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Energy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

6dddcff5b115b40c998a08fbd1cea4d7-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-15-2025, 16:33:23 GMT

apple, exp, inequality, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Benchmarking Partial Observability in Reinforcement Learning with a Suite of Memory-Improvable Domains

Tao, Ruo Yu, Guo, Kaicheng, Allen, Cameron, Konidaris, George

arXiv.org Artificial IntelligenceAug-4-2025

Mitigating partial observability is a necessary but challenging task for general reinforcement learning algorithms. To improve an algorithm's ability to mitigate partial observability, researchers need comprehensive benchmarks to gauge progress. Most algorithms tackling partial observability are only evaluated on benchmarks with simple forms of state aliasing, such as feature masking and Gaussian noise. Such benchmarks do not represent the many forms of partial observability seen in real domains, like visual occlusion or unknown opponent intent. We argue that a partially observable benchmark should have two key properties. The first is coverage in its forms of partial observability, to ensure an algorithm's generalizability. The second is a large gap between the performance of a agents with more or less state information, all other factors roughly equal. This gap implies that an environment is memory improvable: where performance gains in a domain are from an algorithm's ability to cope with partial observability as opposed to other factors. We introduce best-practice guidelines for empirically benchmarking reinforcement learning under partial observability, as well as the open-source library POBAX: Partially Observable Benchmarks in JAX. We characterize the types of partial observability present in various environments and select representative environments for our benchmark. These environments include localization and mapping, visual control, games, and more. Additionally, we show that these tasks are all memory improvable and require hard-to-learn memory functions, providing a concrete signal for partial observability research. This framework includes recommended hyperparameters as well as algorithm implementations for fast, out-of-the-box evaluation, as well as highly performant environments implemented in JAX for GPU-scalable experimentation.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2508.00046

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Education (0.93)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

VitaGraph: Building a Knowledge Graph for Biologically Relevant Learning Tasks

Madeddu, Francesco, Testa, Lucia, De Carlo, Gianluca, Pieroni, Michele, Mastropietro, Andrea, Anagnostopoulos, Aris, Tieri, Paolo, Barbarossa, Sergio

arXiv.org Artificial IntelligenceMay-19-2025

The intrinsic complexity of human biology presents ongoing challenges to scientific understanding. Researchers collaborate across disciplines to expand our knowledge of the biological interactions that define human life. AI methodologies have emerged as powerful tools across scientific domains, particularly in computational biology, where graph data structures effectively model biological entities such as protein-protein interaction (PPI) networks and gene functional networks. Those networks are used as datasets for paramount network medicine tasks, such as gene-disease association prediction, drug repurposing, and polypharmacy side effect studies. Reliable predictions from machine learning models require high-quality foundational data. In this work, we present a comprehensive multi-purpose biological knowledge graph constructed by integrating and refining multiple publicly available datasets. Building upon the Drug Repurposing Knowledge Graph (DRKG), we define a pipeline tasked with a) cleaning inconsistencies and redundancies present in DRKG, b) coalescing information from the main available public data sources, and c) enriching the graph nodes with expressive feature vectors such as molecular fingerprints and gene ontologies. Biologically and chemically relevant features improve the capacity of machine learning models to generate accurate and well-structured embedding spaces. The resulting resource represents a coherent and reliable biological knowledge graph that serves as a state-of-the-art platform to advance research in computational biology and precision medicine. Moreover, it offers the opportunity to benchmark graph-based machine learning and network medicine models on relevant tasks. We demonstrate the effectiveness of the proposed dataset by benchmarking it against the task of drug repurposing, PPI prediction, and side-effect prediction, modeled as link prediction problems.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2505.11185

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

EventFlow: Forecasting Continuous-Time Event Data with Flow Matching

Kerrigan, Gavin, Nelson, Kai, Smyth, Padhraic

arXiv.org Machine LearningOct-9-2024

Continuous-time event sequences, in which events occur at irregular intervals, are ubiquitous across a wide range of industrial and scientific domains. The contemporary modeling paradigm is to treat such data as realizations of a temporal point process, and in machine learning it is common to model temporal point processes in an autoregressive fashion using a neural network. While autoregressive models are successful in predicting the time of a single subsequent event, their performance can be unsatisfactory in forecasting longer horizons due to cascading errors. We propose EventFlow, a non-autoregressive generative model for temporal point processes. Our model builds on the flow matching framework in order to directly learn joint distributions over event times, side-stepping the autoregressive process. EventFlow is likelihood-free, easy to implement and sample from, and either matches or surpasses the performance of state-of-the-art models in both unconditional and conditional generation tasks on a set of standard benchmarks. Many stochastic processes, ranging from consumer behavior (Hernandez et al., 2017) to the occurrence of earthquakes (Ogata, 1998), are best understood as a sequence of discrete events which occur at random times. Any observed event sequence, consisting of one or more event times, may be viewed as a draw from a temporal point process (TPP) (Daley & Vere-Jones, 2003) which characterizes the distribution over such sequences. Given a collection of observed event sequences, faithfully modeling the underlying TPP is critical in both understanding and forecasting the phenomenon of interest. While multiple different parametric TPP models have been proposed (Hawkes, 1971; Isham & Westcott, 1979), their limited flexibility limits their application when modeling complex real-world sequences. This has motivated the use of neural networks (Du et al., 2016; Mei & Eisner, 2017) in modeling TPPs.

dataset, point process, sequence, (16 more...)

arXiv.org Machine Learning

2410.0743

Country: North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Sharing Parameter by Conjugation for Knowledge Graph Embeddings in Complex Space

Feng, Xincan, Qu, Zhi, Cheng, Yuchang, Watanabe, Taro, Yugami, Nobuhiro

arXiv.org Artificial IntelligenceApr-17-2024

A Knowledge Graph (KG) is the directed graphical representation of entities and relations in the real world. KG can be applied in diverse Natural Language Processing (NLP) tasks where knowledge is required. The need to scale up and complete KG automatically yields Knowledge Graph Embedding (KGE), a shallow machine learning model that is suffering from memory and training time consumption issues. To mitigate the computational load, we propose a parameter-sharing method, i.e., using conjugate parameters for complex numbers employed in KGE models. Our method improves memory efficiency by 2x in relation embedding while achieving comparable performance to the state-of-the-art non-conjugate models, with faster, or at least comparable, training time. We demonstrated the generalizability of our method on two best-performing KGE models $5^{\bigstar}\mathrm{E}$ and $\mathrm{ComplEx}$ on five benchmark datasets.

dataset, hyperparameter, relation, (15 more...)

arXiv.org Artificial Intelligence

2404.11809

Country:

Asia > China > Beijing > Beijing (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.82)

Add feedback

Stock Recommendations for Individual Investors: A Temporal Graph Network Approach with Diversification-Enhancing Contrastive Learning

Lee, Youngbin, Kim, Yejin, Lee, Yongjae

arXiv.org Artificial IntelligenceMar-27-2024

In complex financial markets, recommender systems can play a crucial role in empowering individuals to make informed decisions. Existing studies predominantly focus on price prediction, but even the most sophisticated models cannot accurately predict stock prices. Also, many studies show that most individual investors do not follow established investment theories because they have their own preferences. Hence, the tricky point in stock recommendation is that recommendations should give good investment performance but also should not ignore individual preferences. To develop effective stock recommender systems, it is essential to consider three key aspects: 1) individual preferences, 2) portfolio diversification, and 3) temporal aspect of both stock features and individual preferences. In response, we develop the portfolio temporal graph network recommender PfoTGNRec, which can handle time-varying collaborative signals and incorporates diversification-enhancing contrastive learning. As a result, our model demonstrated superior performance compared to various baselines, including cutting-edge dynamic embedding models and existing stock recommendation models, in a sense that our model exhibited good investment performance while maintaining competitive in capturing individual preferences. The source code and data are available at https://anonymous.4open.science/r/IJCAI2024-12F4.

investment performance, portfolio, recommendation, (15 more...)

arXiv.org Artificial Intelligence

2404.07223

Country: Asia > South Korea > Ulsan > Ulsan (0.04)

Genre: Research Report (0.82)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improving Knowledge Distillation for BERT Models: Loss Functions, Mapping Methods, and Weight Tuning

Dankar, Apoorv, Jassani, Adeem, Kumar, Kartikaeya

arXiv.org Artificial IntelligenceAug-26-2023

The use of large transformer-based models such as BERT, GPT, and T5 has led to significant advancements in natural language processing. However, these models are computationally expensive, necessitating model compression techniques that reduce their size and complexity while maintaining accuracy. This project investigates and applies knowledge distillation for BERT model compression, specifically focusing on the TinyBERT student model. We explore various techniques to improve knowledge distillation, including experimentation with loss functions, transformer layer mapping methods, and tuning the weights of attention and representation loss and evaluate our proposed techniques on a selection of downstream tasks from the GLUE benchmark. The goal of this work is to improve the efficiency and effectiveness of knowledge distillation, enabling the development of more efficient and accurate models for a range of natural language processing tasks.

experiment, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2308.13958

Country:

North America > Canada > Ontario > Toronto (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report > New Finding (0.69)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Data Cross-Segmentation for Improved Generalization in Reinforcement Learning Based Algorithmic Trading

Duvvur, Vikram, Mehta, Aashay, Sun, Edward, Wu, Bo, Chan, Ken Yew, Schneider, Jeff

arXiv.org Artificial IntelligenceJul-18-2023

The use of machine learning in algorithmic trading systems is increasingly common. In a typical set-up, supervised learning is used to predict the future prices of assets, and those predictions drive a simple trading and execution strategy. This is quite effective when the predictions have sufficient signal, markets are liquid, and transaction costs are low. However, those conditions often do not hold in thinly traded financial markets and markets for differentiated assets such as real estate or vehicles. In these markets, the trading strategy must consider the long-term effects of taking positions that are relatively more difficult to change. In this work, we propose a Reinforcement Learning (RL) algorithm that trades based on signals from a learned predictive model and addresses these challenges. We test our algorithm on 20+ years of equity data from Bursa Malaysia.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2307.09377

Country: Asia > Malaysia (0.24)

Genre: Research Report (0.82)

Industry:

Banking & Finance > Trading (1.00)
Materials > Chemicals > Industrial Gases > Liquified Gas (0.32)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.32)
Energy > Oil & Gas > Midstream (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

CDMA: A Practical Cross-Device Federated Learning Algorithm for General Minimax Problems

Xie, Jiahao, Zhang, Chao, Shen, Zebang, Liu, Weijie, Qian, Hui

arXiv.org Artificial IntelligenceJun-28-2023

Minimax problems arise in a wide range of important applications including robust adversarial learning and Generative Adversarial Network (GAN) training. Recently, algorithms for minimax problems in the Federated Learning (FL) paradigm have received considerable interest. Existing federated algorithms for general minimax problems require the full aggregation (i.e., aggregation of local model information from all clients) in each training round. Thus, they are inapplicable to an important setting of FL known as the cross-device setting, which involves numerous unreliable mobile/IoT devices. In this paper, we develop the first practical algorithm named CDMA for general minimax problems in the cross-device FL setting. CDMA is based on a Start-Immediately-With-Enough-Responses mechanism, in which the server first signals a subset of clients to perform local computation and then starts to aggregate the local results reported by clients once it receives responses from enough clients in each round. With this mechanism, CDMA is resilient to the low client availability. In addition, CDMA is incorporated with a lightweight global correction in the local update steps of clients, which mitigates the impact of slow network connections. We establish theoretical guarantees of CDMA under different choices of hyperparameters and conduct experiments on AUC maximization, robust adversarial network training, and GAN training tasks. Theoretical and experimental results demonstrate the efficiency of CDMA.

artificial intelligence, machine learning, tnull 2, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1609/aaai.v37i9.26246

2105.14216

Country:

Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback